Hertuda results for OAEI 2012
نویسنده
چکیده
Hertuda is a very simple element based matcher. It shows that tokenization and a string measure can also yield in good results. It is an improved version of the first version submitted to the OAEI 2011.5. 1 Presentation of the system 1.1 State, purpose, general statement Hertuda is a first idea of an element based matcher with a string comparison. It generates only homogeneous matchings, that are compatible with OWL Lite/DL. This means that classes, data properties and object properties are handled separately. As a result there are three thresholds that can be set independently. One for class to class, object to object property and data to data property. A simple overall threshold sets all sub-thresholds to the same value. Over all concepts a cross product is computed. If the confidence of a comparison is higher than the threshold for this type of matching, then it is added to the resulting alignment. For each concept all labels, comments and URI fragments are extracted. Then these terms form a set. To compare two concepts, respectively sets of terms, each element of the first set is compared with each element of the second set. The best value is the similarity measure for these concepts. A preprocessing step for term comparison is to tokenize it. All camel case terms or terms with underscores or hyphens in it, are split into single tokens and converted to lower case. Therefore writePaper, write-paper and write paper will all result in two tokens, namely {write} and {paper}. Afterwards a similarity matrix is computed with the Damerau–Levenshtein distance [1, 2]. The average of the best mappings are then returned as the similarity between two token sets. Figure 1 depicts schematically the algorithm of Hertuda. 1.2 Specific techniques used The final matching system contains of the string matching approach and a filter for removing alignments that are not considered in the reference alignment. The system is depicted in figure 2. The filter removes all alignments that are true, but are not in the reference alignment. The removed mappings are mostly from upper level ontologies like dublin core or friend of a friend. void function hertuda() { for each type in {class, data property, object property} for each concept cOne in ontology one for each concept cTwo in ontology two if(compareConcepts(cOne, cTwo) > threshould(type)){ add alignment between cOne and cTwo } } float compareConcepts(Concept cOne, Concept cTwo) { for each termOne in {label(cOne), comment(cOne), fragment(cOne)} for each termTwo in {label(cTwo), comment(cTwo), fragment(cTwo)} conceptsMatrix[termOne, termTwo] = compareTerms(termOne, termTwo) return maximumOf(conceptsMatrix) } float compareTerms(String tOne, String tTwo) { tokensOne = tokenize(tOne) tokensTwo = tokenize(tTwo) tokensOne = removeStopwords(tokensOne) tokensTwo = removeStopwords(tokensTwo) for each x in tokensOne for each y in tokensTwo similarityMatrix[x, y] = damerauLevenshtein(x, y) return bestAverageScore(similarityMatrix) } Fig. 1. Algorithm for Hertuda 1.3 Adaptations made for the evaluation There are no specific adaptions made. The overall threshold for a normalised Damerau–Levenshtein distance is set to 0.88. 1.4 Link to the system and parameters file The tool version submitted to OAEI 2012 can be downloaded from http://www. ke.tu-darmstadt.de/resources/ontology-matching/hertuda Fig. 2. Composition of matching algorithms of Hertuda. The string based approach and the filter are sequential composed. 2 Results 2.1 Benchmark The implemented approach is only string based and works on the element level, whereby missing labels or comments or replaced terms by random strings has a high effect on the matching algorithm. 2.2 Anatomy Hertuda has a higher recall than the StringEquiv from OEAI 2011.5 (0.673 to 0.622). Through the tokenization and also the string distance the precision is much lower (0.69 to 0.997). This yield in worse F-Measure for Hertuda (68.1 to 0.766). 2.3 Conference The first version of Hertuda only compares the tokens for equality, whereas the new version computes a string similarity. Though the recall is a little bit higher than the first version, but the precision is lower. All in all, the F-Measure has increased by 0.01. This approach can find a mapping between has the first name and hasFirstName with an similarity of 1.0. 2.4 Multifarm Hertuda is not designed for multiligual matching. Nevertheless, some simple alignments are returned like person(en) ≡ person(de). 2.5 Library In the Library Track a relatively high recall has been achieved (0.925). Through splitting the words a very low precision value (0.465) was the result. 2.6 Large Biomedical Ontologies Hertuda was only capable to match the small task for FMA-NCI and FMA-SNOMED. The large ones are not finished in time. The reason can be, that the complexity is to high trough the cross product of all concepts. 3 General comments 3.1 Comments on the results The approach shows, that also simple string based algorithms can yield in good results. The improvement of version 1 is not much, but the recall was higher in many tracks. The precision was therefore lower, but it ends often in better F-Measure values. 3.2 Discussions on the way to improve the proposed system To improve Hertuda it is possible to add more stop words in different languages. This helps by comparing two ontologies that have the same language, but this differs from English. Another point is to set the threshold more precise and not one for all. It is also imaginable to set the threshold based on the matching ontologies. This will help to reduce the low precision in some tracks. 4 ConclusionThe results show that an string based algorithm can also produce good alignments. Therecall of this version is in many cases much higher that the first version. Thus it ispossible to use this matcher as a previous step of structural matchers. References1. Damerau, F.: A technique for computer detection and correction of spelling errors. Commu-nications of the ACM 7(3) (1964) 171–1762. Levenshtein, V.: Binary codes capable of correcting deletions, insertions and reversals. In:Soviet Physics Doklady. Volume 10. (1966) 707
منابع مشابه
Evaluating Ontology Matching Systems on Large, Multilingual and Real-world Test Cases
In the field of ontology matching, the most systematic evaluation of matching systems is established by the Ontology Alignment Evaluation Initiative (OAEI), which is an annual campaign for evaluating ontology matching systems organized by different groups of researchers. In this paper, we report on the results of an intermediary OAEI campaign called OAEI 2011.5. The evaluations of this campaign...
متن کاملExploiting the UMLS metathesaurus in the ontology alignment evaluation initiative
In this paper we describe how the UMLS Metathesaurus—the most comprehensive effort for integrating medical thesauri and ontologies—is being used within the context of the Ontology Alignment Evaluation Initiative (OAEI). We also present the obtained results in the Large BioMed track of the OAEI 2011.5 campaign where the reference alignments are based on UMLS. Finally, we propose a new reference ...
متن کاملResults of the Ontology Alignment Evaluation Initiative 2012
Ontology matching consists of finding correspondences between semantically related entities of two ontologies. OAEI campaigns aim at comparing ontology matching systems on precisely defined test cases. These test cases can use ontologies of different nature (from simple thesauri to expressive OWL ontologies) and use different modalities, e.g., blind evaluation, open evaluation, consensus. OAEI ...
متن کاملTesting the AgreementMaker System in the Anatomy Task of OAEI 2012
The AgreementMaker system was the leading system in the anatomy task of the Ontology Alignment Evaluation Initiative (OAEI) competition in 2011. While AgreementMaker did not compete in OAEI 2012, here we report on its performance in the 2012 anatomy task, using the same configurations of AgreementMaker submitted to OAEI 2011. Additionally, we also test AgreementMaker using an updated version of...
متن کاملUsing the OM2R meta-data model for ontology mapping reuse for the ontology alignment challenge - a case study
Ontology matching and mapping is of critical importance to effective consumption of distributed and heterogeneous data-sets in today’s Web of Data. Since 2004 the Ontology Alignment Evaluation Initiative (OAEI) provides a number of complex challenges to evaluate the performance of the increasing number of matching tools and methods. This leads to the question how the individual OAEI challenges ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012